[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode#55936
[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode#55936gengliangwang wants to merge 3 commits into
Conversation
0beac87 to
f699004
Compare
f699004 to
8f72067
Compare
|
Audited this PR for the same lessons surfaced by @viirya and @cloud-fan on #55938 (and applied to #55934 / #55939):
So no changes needed on this PR for that review. |
8f72067 to
62ce0cd
Compare
Extend `CastUtils.java` with two helpers for decimal precision adjustment and use them from `Cast.changePrecision` (both the eval and codegen implementations). The new helpers mutate the input `Decimal` in place (matching the behavior of the existing inline codegen), so they're safe to call on the temporary produced by `Decimal.fromString(...)` / `Decimal.apply(...)` / decimal-arithmetic results. Helpers added: * `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on overflow, preserves the per-call-site `QueryContext` so error messages keep their query-origin info. * `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null` on overflow (no `QueryContext` needed). `Cast.scala` changes: * `changePrecision` eval method dispatches on `nullOnOverflow` and delegates to the appropriate helper. * `changePrecision` codegen method has three branches now: the existing `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch (inline), and the ANSI throw branch which now emits a one-line `CastUtils.changePrecisionExact(...)` call instead of the 5-line `if/else` overflow block. Part of SPARK-56908 (umbrella). The ANSI throw branch of `Cast.changePrecision` is hit by every cast to decimal that may overflow (very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans. No. The compiled behavior is identical; only the emitted Java source text changes. ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \ *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite \ *ExpressionClassIdentitySuite" ``` 337/337 pass. Generated-by: Cursor 1.x
62ce0cd to
fd9228c
Compare
cloud-fan
left a comment
There was a problem hiding this comment.
Small, well-scoped refactor in the SPARK-56908 codegen-simplification series. The two new CastUtils helpers (changePrecisionExact / changePrecisionOrNull) wrap Decimal.changePrecision with throw/null overflow handling and are reused from both the eval path and the ANSI-throw codegen branch.
The conversation comment already audited the redundancy questions raised on #55938 (vs. Decimal.toPrecision, vs. one-liner eval path) and the answers hold up: toPrecision clones the input, the helpers mutate in place; the old eval block was 8 lines, not 1. Semantics check out — when Decimal.changePrecision returns false the input is left untouched (early returns in Decimal.changePrecision precede the field writes), so the error message still cites the original value, matching pre-PR behavior.
Findings below are all minor / housekeeping: stale class-level Javadoc, time-bound phrasing in a new section comment, and a now-redundant argument that the refactor naturally invites cleaning up.
| throw QueryExecutionErrors.castingCauseOverflowError(v, DOUBLE, SHORT); | ||
| } | ||
|
|
||
| // ----- decimal precision adjustment ----- |
There was a problem hiding this comment.
The class Javadoc above scopes this file as "ANSI overflow-checked narrowing to byte / short" and explicitly says "The helpers below cover byte / short only". That's no longer accurate once this section is added — the file now also hosts decimal precision adjustment, which is neither narrowing nor byte/short. Worth widening the header doc (e.g., drop the "byte/short only" claim and frame it as "ANSI overflow-checked operations used from Cast.doGenCode / eval paths"), or this PR's helpers will read as "shouldn't be here".
There was a problem hiding this comment.
Done — class Javadoc rewritten to drop the byte/short-only scope (now "ANSI overflow-checked casts" generally) and moved the byte/short narrowing limitation into an inline comment above the integral-narrowing section.
| // Mutates the input Decimal in place to apply the target precision/scale on | ||
| // the per-row hot path. | ||
|
|
There was a problem hiding this comment.
"Used by Cast.changePrecision (and by BinaryArithmetic / DivModLike in follow-up PRs)" describes the rollout, not the helpers themselves. Once the follow-up PRs land (or don't) the phrasing is wrong or misleading, and the caller enumeration drifts as the codebase evolves. The two facts a future reader actually needs are already in the first and last sentences: in-place mutation, and the hot-path rationale. Suggest dropping the middle:
| // Mutates the input Decimal in place to apply the target precision/scale on | |
| // the per-row hot path. | |
| // Mutates the input Decimal in place to avoid the per-row clone() done by | |
| // Decimal.toPrecision, since these helpers are called on the per-row hot path. |
There was a problem hiding this comment.
Adopted your suggestion. Also added a follow-up sentence (per @viirya's third note) pinning the invariant that Decimal.changePrecision returns false before any write-back, so d stays in its pre-cast state and the error message cites the original value.
| |} | ||
| """.stripMargin | ||
| } else { | ||
| val errorContextCode = getContextOrNullCode(ctx, !nullOnOverflow) |
There was a problem hiding this comment.
Pre-refactor this lived in a shared else block that handled both nullOnOverflow == true and nullOnOverflow == false, so !nullOnOverflow was load-bearing. Now this line is reached only inside the else branch where nullOnOverflow == false, so !nullOnOverflow is always true — which is the default of getContextOrNullCode. Can drop the second arg:
| val errorContextCode = getContextOrNullCode(ctx, !nullOnOverflow) | |
| val errorContextCode = getContextOrNullCode(ctx) |
There was a problem hiding this comment.
Done — dropped to getContextOrNullCode(ctx). You're right, with the else if (nullOnOverflow) branch split out, !nullOnOverflow is always true here, which is the default.
|
Thanks for the cleanup — refactor is semantically equivalent and the codegen shrink on the ANSI throw branch is a nice win. A few non-blocking nits:
Otherwise LGTM. |
…ecisionExact invariant, drop redundant getContextOrNullCode arg
|
@viirya thanks — all three addressed in
|
|
@cloud-fan @viirya thanks for the review. Merging to master/4.3 |
### What changes were proposed in this pull request? Extend `CastUtils.java` with two helpers for decimal precision adjustment and use them from `Cast.changePrecision` (both eval and codegen). The new helpers mutate the input `Decimal` in place (matching the in-place semantics of the existing inline codegen), so they're safe to call on the temporary produced by `Decimal.fromString(...)` / `Decimal.apply(...)` / decimal-arithmetic results. Helpers added: * `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on overflow, preserves the per-call-site `QueryContext` so error messages keep their query-origin info. * `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null` on overflow (no `QueryContext` needed). `Cast.scala` changes: * `changePrecision` eval method dispatches on `nullOnOverflow` and delegates to the appropriate helper. * `changePrecision` codegen method has three branches now: the existing `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch (inline), and the ANSI throw branch which now emits a one-line `CastUtils.changePrecisionExact(...)` call instead of the 5-line `if/else` overflow block. ### Why are the changes needed? Part of SPARK-56908 (umbrella). The ANSI throw branch of `Cast.changePrecision` is hit by every cast to decimal that may overflow (very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite" ``` 332/332 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x Closes #55936 from gengliangwang/SPARK-56911-cast-decimal. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit 8af1b8b) Signed-off-by: Gengliang Wang <gengliang@apache.org>
What changes were proposed in this pull request?
Extend
CastUtils.javawith two helpers for decimal precision adjustment and use them fromCast.changePrecision(both eval and codegen). The new helpers mutate the inputDecimalin place (matching the in-place semantics of the existing inline codegen), so they're safe to call on the temporary produced byDecimal.fromString(...)/Decimal.apply(...)/ decimal-arithmetic results.Helpers added:
changePrecisionExact(Decimal, int, int, QueryContext): ANSI throw on overflow, preserves the per-call-siteQueryContextso error messages keep their query-origin info.changePrecisionOrNull(Decimal, int, int): non-ANSI, returnsnullon overflow (noQueryContextneeded).Cast.scalachanges:changePrecisioneval method dispatches onnullOnOverflowand delegates to the appropriate helper.changePrecisioncodegen method has three branches now: the existingcanNullSafeCastfast path (unchanged), anullOnOverflowbranch (inline), and the ANSI throw branch which now emits a one-lineCastUtils.changePrecisionExact(...)call instead of the 5-lineif/elseoverflow block.Why are the changes needed?
Part of SPARK-56908 (umbrella). The ANSI throw branch of
Cast.changePrecisionis hit by every cast to decimal that may overflow (very common in TPC-DS, wherecast(int as decimal(7,2))is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans.Does this PR introduce any user-facing change?
No. The compiled behavior is identical; only the emitted Java source text changes.
How was this patch tested?
332/332 pass.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.x